Multi-stage heterogeneous ensemble meta-learning with hands-off user-interface and stand-alone prediction using principal components regression: The R package EnsemblePCReg

نویسندگان

  • Mansour T.A. Sharabiani
  • Alireza S. Mahani
چکیده

Despite the fact that ensemble meta-learning of a heterogeneous collection of base learners is an effective means to reduce the generalization error in predictive models, several factors have impeded a broad adoption of such techniques among practitioners. These factors include an intractable number of choices of base learners and their tuning parameters, complex methodology required for integration of base learners, the ensuing complexity of software needed to support stand-alone prediction, and significant CPU and memory consumption of heterogeneous ensemble meta-learning techniques. The R package EnsemblePCReg overcomes the above barriers by combining several features. Sensible base-learner parameter grids provide a hands-off API for non-experts while allowing expert users to exert control by overriding default settings. Sophisticated ensemble generation and integration methods, combining stacked generalization and principal components regression, offer favorable generalization performance. Finally, computational optimizations such as advanded thread scheduling for improved parallelization scaling and file methods for relieving memory consumption during training and prediction, significantly increase the range of data sizes that can be handled on personal computers. In combining these features, EnsemblePCReg significantly lowers the barrier for practitioners to apply heterogeneous ensemble meta-learning techniques to their everyday regression problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability

Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. ‎In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set‎. ‎Therefore‎, ‎developing a machine for p...

متن کامل

Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search

In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

The pls Package: Principal Component and Partial Least Squares Regression in R

The pls package implements principal component regression (PCR) and partial least squares regression (PLSR) in R (R Development Core Team 2006b), and is freely available from the Comprehensive R Archive Network (CRAN), licensed under the GNU General Public License (GPL). The user interface is modelled after the traditional formula interface, as exemplified by lm. This was done so that people us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016